{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 智能体 RAG：通过查询重构和自查询来增强你的 RAG ！🚀\n",
    "\n",
    "_作者: [Aymeric Roucher](https://huggingface.co/m-ric)_\n",
    "\n",
    ">这个教程比较高级，建议你先看看另一个[更基础的教程](advanced_rag)。\n",
    "\n",
    ">检索增强生成（RAG）是一种用大型语言模型（LLM）来回答问题的方法，但它会先从知识库中查找相关信息。这种方法比只用大型语言模型有很多好处，比如可以基于真实的事实来回答问题，减少虚构内容，还可以让模型获取特定领域的知识，并且可以精确控制模型从知识库中获取信息。\n",
    "\n",
    "不过，普通的RAG方法有两个主要问题：\n",
    "\n",
    "- 它只进行**一次信息检索**，如果检索的结果不好，那么回答也会差。\n",
    "- 它计算**语义相似性时是以用户的提问为参照**，这可能不太理想。比如，用户提出的问题通常是用疑问句，而包含答案的文档通常是陈述句，这样就会导致真正含有答案的文档和用户提问的相似性得分不高，可能会错过重要的信息。\n",
    "\n",
    "为了解决这些问题，我们可以创建**一个带有检索功能的 RAG 智能体。**\n",
    "\n",
    "这个智能体可以 ✅ 自己构建查询，并且 ✅ 在需要的时候重新检索信息。\n",
    "\n",
    "所以，我们得用点高级的 RAG 技术！\n",
    "\n",
    "- 不直接使用用户的提问去搜索，而是智能体自行制定一个更接近目标文档的参考句子，就像 [HyDE](https://huggingface.co/papers/2212.10496) 那样\n",
    "- 智能体能生成片段并在需要时重新检索，就像 [Self-Query](https://docs.llamaindex.ai/en/stable/examples/evaluation/RetryQuery/) 那样\n",
    "\n",
    "让我们开始做这个系统吧。🛠️\n",
    "\n",
    "运行下面的命令来安装所需的软件包：\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install pandas langchain langchain-community sentence-transformers faiss-cpu smolagents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们首先加载一个知识库，以便在其上执行 RAG：这个数据集是许多 `huggingface` 软件包的文档页面的汇总，以 markdown 格式存储。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/aymeric/Documents/Code/cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import datasets\n",
    "\n",
    "knowledge_base = datasets.load_dataset(\"m-ric/huggingface_doc\", split=\"train\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们通过处理数据集并将其存储到向量数据库中，为检索器准备知识库。我们使用 [LangChain](https://python.langchain.com/)，因为它具有出色的向量数据库工具。对于嵌入模型，我们使用 [thenlper/gte-small](https://huggingface.co/thenlper/gte-small)，因为它在我们的 `RAG_evaluation` 指南中表现良好。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting documents...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 2647/2647 [00:34<00:00, 76.04it/s] \n",
      "/Users/aymeric/Documents/Code/cookbook/.venv/lib/python3.12/site-packages/langchain_core/_api/deprecation.py:139: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.\n",
      "  warn_deprecated(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding documents... This should take a few minutes (5 minutes on MacBook with M1 Pro)\n"
     ]
    }
   ],
   "source": [
    "from transformers import AutoTokenizer\n",
    "from langchain.docstore.document import Document\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
    "from langchain_community.vectorstores.utils import DistanceStrategy\n",
    "from tqdm import tqdm\n",
    "\n",
    "source_docs = [\n",
    "    Document(page_content=doc[\"text\"], metadata={\"source\": doc[\"source\"].split(\"/\")[1]})\n",
    "    for doc in knowledge_base\n",
    "]\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(\n",
    "    AutoTokenizer.from_pretrained(\"thenlper/gte-small\"),\n",
    "    chunk_size=200,\n",
    "    chunk_overlap=20,\n",
    "    add_start_index=True,\n",
    "    strip_whitespace=True,\n",
    "    separators=[\"\\n\\n\", \"\\n\", \".\", \" \", \"\"],\n",
    ")\n",
    "\n",
    "# Split docs and keep only unique ones\n",
    "print(\"Splitting documents...\")\n",
    "docs_processed = []\n",
    "unique_texts = {}\n",
    "for doc in tqdm(source_docs):\n",
    "    new_docs = text_splitter.split_documents([doc])\n",
    "    for new_doc in new_docs:\n",
    "        if new_doc.page_content not in unique_texts:\n",
    "            unique_texts[doc.page_content] = True\n",
    "            docs_processed.append(new_doc)\n",
    "\n",
    "print(\n",
    "    \"Embedding documents... This should take a few minutes (5 minutes on MacBook with M1 Pro)\"\n",
    ")\n",
    "embedding_model = HuggingFaceEmbeddings(model_name=\"thenlper/gte-small\")\n",
    "vectordb = FAISS.from_documents(\n",
    "    documents=docs_processed,\n",
    "    embedding=embedding_model,\n",
    "    distance_strategy=DistanceStrategy.COSINE,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在数据库已经准备好了：让我们构建我们的智能体 RAG 系统吧！\n",
    "\n",
    "👉 我们只需要一个 `RetrieverTool`，我们的智能体可以利用它从知识库中检索信息。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from smolagents import Tool\n",
    "from langchain_core.vectorstores import VectorStore\n",
    "\n",
    "\n",
    "class RetrieverTool(Tool):\n",
    "    name = \"retriever\"\n",
    "    description = \"Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query.\"\n",
    "    inputs = {\n",
    "        \"query\": {\n",
    "            \"type\": \"text\",\n",
    "            \"description\": \"The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.\",\n",
    "        }\n",
    "    }\n",
    "    output_type = \"text\"\n",
    "\n",
    "    def __init__(self, vectordb: VectorStore, **kwargs):\n",
    "        super().__init__(**kwargs)\n",
    "        self.vectordb = vectordb\n",
    "\n",
    "    def forward(self, query: str) -> str:\n",
    "        assert isinstance(query, str), \"Your search query must be a string\"\n",
    "\n",
    "        docs = self.vectordb.similarity_search(\n",
    "            query,\n",
    "            k=7,\n",
    "        )\n",
    "\n",
    "        return \"\\nRetrieved documents:\\n\" + \"\".join(\n",
    "            [\n",
    "                f\"===== Document {str(i)} =====\\n\" + doc.page_content\n",
    "                for i, doc in enumerate(docs)\n",
    "            ]\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在创建一个利用这个工具的智能体就简单了！\n",
    "\n",
    "智能体在初始化时需要以下参数：\n",
    "- *`tools`*：智能体能够调用的工具列表。\n",
    "- *`llm_engine`*：为智能体提供动力的LLM。\n",
    "\n",
    "我们的 `llm_engine` 必须是一个可调用的对象，它接受一个 [messages](https://huggingface.co/docs/transformers/main/chat_templating) 列表作为输入并返回文本。它还需要接受一个 `stop_sequences` 参数，该参数指示何时停止生成。为了方便起见，我们直接使用包中提供的 `HfModel` 类来获取一个调用我们的 [Inference API](https://huggingface.co/docs/api-inference/en/index) 的 LLM 引擎。\n",
    "我们使用 [CohereForAI/c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus) 作为 llm 引擎，因为：\n",
    "- 它有一个长达 128k 的上下文，这对于处理长源文档很有帮助\n",
    "- 它在 HF 的 Inference API 上始终免费提供！\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from smolagents import HfModel, ToolCallingAgent\n",
    "\n",
    "model = HfModel(\"CohereForAI/c4ai-command-r-plus\")\n",
    "\n",
    "retriever_tool = RetrieverTool(vectordb)\n",
    "agent = ToolCallingAgent(\n",
    "    tools=[retriever_tool], model=model, max_iterations=4, verbose=2\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "既然我们已经将智能体初始化为 `ToolCallingAgent`，它就已经自动赋予了一个默认的系统提示，告诉 LLM 引擎要逐步处理并生成工具调用作为 JSON 块（你可以根据需要用你自己的提示模板替换这个）。\n",
    "\n",
    "然后，当它的 `.run()` 方法被启动时，智能体负责调用 LLM 引擎，解析工具调用的 JSON 块并执行这些工具调用，所有这些都在一个循环中进行，只有当提供最终答案时才会结束。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[33;1m======== New task ========\u001b[0m\n",
      "\u001b[37;1mHow can I push a model to the Hub?\u001b[0m\n",
      "\u001b[38;20mSystem prompt is as follows:\u001b[0m\n",
      "\u001b[38;20mYou are an expert assistant who can solve any task using JSON tool calls. You will be given a task to solve as best you can.\n",
      "To do so, you have been given access to the following tools: 'retriever', 'final_answer'\n",
      "The way you use the tools is by specifying a json blob, ending with '<end_action>'.\n",
      "Specifically, this json should have an `action` key (name of the tool to use) and an `action_input` key (input to the tool).\n",
      "\n",
      "The $ACTION_JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. It should be formatted in json. Do not try to escape special characters. Here is the template of a valid $ACTION_JSON_BLOB:\n",
      "{\n",
      "  \"action\": $TOOL_NAME,\n",
      "  \"action_input\": $INPUT\n",
      "}<end_action>\n",
      "\n",
      "Make sure to have the $INPUT as a dictionary in the right format for the tool you are using, and do not put variable names as input if you can find the right values.\n",
      "\n",
      "You should ALWAYS use the following format:\n",
      "\n",
      "Thought: you should always think about one action to take. Then use the action as follows:\n",
      "Action:\n",
      "$ACTION_JSON_BLOB\n",
      "Observation: the result of the action\n",
      "... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $ACTION_JSON_BLOB must only use a SINGLE action at a time.)\n",
      "\n",
      "You can use the result of the previous action as input for the next action.\n",
      "The observation will always be a string: it can represent a file, like \"image_1.jpg\".\n",
      "Then you can use it as input for the next action. You can do it for instance as follows:\n",
      "\n",
      "Observation: \"image_1.jpg\"\n",
      "\n",
      "Thought: I need to transform the image that I received in the previous observation to make it green.\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"image_transformer\",\n",
      "  \"action_input\": {\"image\": \"image_1.jpg\"}\n",
      "}<end_action>\n",
      "\n",
      "To provide the final answer to the task, use an action blob with \"action\": \"final_answer\" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this:\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"final_answer\",\n",
      "  \"action_input\": {\"answer\": \"insert your final answer here\"}\n",
      "}<end_action>\n",
      "\n",
      "\n",
      "Here are a few examples using notional tools:\n",
      "---\n",
      "Task: \"Generate an image of the oldest person in this document.\"\n",
      "\n",
      "Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"document_qa\",\n",
      "  \"action_input\": {\"document\": \"document.pdf\", \"question\": \"Who is the oldest person mentioned?\"}\n",
      "}<end_action>\n",
      "Observation: \"The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland.\"\n",
      "\n",
      "\n",
      "Thought: I will now generate an image showcasing the oldest person.\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"image_generator\",\n",
      "  \"action_input\": {\"text\": \"\"A portrait of John Doe, a 55-year-old man living in Canada.\"\"}\n",
      "}<end_action>\n",
      "Observation: \"image.png\"\n",
      "\n",
      "Thought: I will now return the generated image.\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"final_answer\",\n",
      "  \"action_input\": \"image.png\"\n",
      "}<end_action>\n",
      "\n",
      "---\n",
      "Task: \"What is the result of the following operation: 5 + 3 + 1294.678?\"\n",
      "\n",
      "Thought: I will use python code evaluator to compute the result of the operation and then return the final answer using the `final_answer` tool\n",
      "Action:\n",
      "{\n",
      "    \"action\": \"python_interpreter\",\n",
      "    \"action_input\": {\"code\": \"5 + 3 + 1294.678\"}\n",
      "}<end_action>\n",
      "Observation: 1302.678\n",
      "\n",
      "Thought: Now that I know the result, I will now return it.\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"final_answer\",\n",
      "  \"action_input\": \"1302.678\"\n",
      "}<end_action>\n",
      "\n",
      "---\n",
      "Task: \"Which city has the highest population , Guangzhou or Shanghai?\"\n",
      "\n",
      "Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\n",
      "Action:\n",
      "{\n",
      "    \"action\": \"search\",\n",
      "    \"action_input\": \"Population Guangzhou\"\n",
      "}<end_action>\n",
      "Observation: ['Guangzhou has a population of 15 million inhabitants as of 2021.']\n",
      "\n",
      "\n",
      "Thought: Now let's get the population of Shanghai using the tool 'search'.\n",
      "Action:\n",
      "{\n",
      "    \"action\": \"search\",\n",
      "    \"action_input\": \"Population Shanghai\"\n",
      "}\n",
      "Observation: '26 million (2019)'\n",
      "\n",
      "Thought: Now I know that Shanghai has a larger population. Let's return the result.\n",
      "Action:\n",
      "{\n",
      "  \"action\": \"final_answer\",\n",
      "  \"action_input\": \"Shanghai\"\n",
      "}<end_action>\n",
      "\n",
      "\n",
      "Above example were using notional tools that might not exist for you. You only have access to those tools:\n",
      "\n",
      "- retriever: Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query.\n",
      "    Takes inputs: {'query': {'type': 'text', 'description': 'The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.'}}\n",
      "\n",
      "- final_answer: Provides a final answer to the given problem\n",
      "    Takes inputs: {'answer': {'type': 'text', 'description': 'The final answer to the problem'}}\n",
      "\n",
      "Here are the rules you should always follow to solve your task:\n",
      "1. ALWAYS provide a 'Thought:' sequence, and an 'Action:' sequence that ends with <end_action>, else you will fail.\n",
      "2. Always use the right arguments for the tools. Never use variable names in the 'action_input' field, use the value instead.\n",
      "3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.\n",
      "4. Never re-do a tool call that you previously did with the exact same parameters.\n",
      "\n",
      "Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n",
      "\u001b[0m\n",
      "\u001b[38;20m===== New step =====\u001b[0m\n",
      "===== Calling LLM with this last message: =====\n",
      "{'role': <MessageRole.USER: 'user'>, 'content': 'Task: How can I push a model to the Hub?'}\n",
      "\u001b[38;20m===== Output message of the LLM: =====\u001b[0m\n",
      "\u001b[38;20mThought: I can use the \"retriever\" tool to find documents relevant to the question, \"How can I push a model to the Hub?\" I will then read through the retrieved documents to find the relevant information and provide an answer to the question.\n",
      "\n",
      "Action: ```json\n",
      "{\n",
      "  \"action\": \"retriever\",\n",
      "  \"action_input\": {\n",
      "    \"query\": \"How can I push a model to the Hub?\"\n",
      "  }\n",
      "}\u001b[0m\n",
      "\u001b[38;20m===== Extracting action =====\u001b[0m\n",
      "\u001b[33;1mCalling tool: 'retriever' with arguments: {'query': 'How can I push a model to the Hub?'}\u001b[0m\n",
      "Retrieved documents:\n",
      "===== Document 0 =====\n",
      "# Step 7. Push everything to the Hub\n",
      "    api.upload_folder(\n",
      "        repo_id=repo_id,\n",
      "        folder_path=repo_local_path,\n",
      "        path_in_repo=\".\",\n",
      "    )\n",
      "\n",
      "    print(\"Your model is pushed to the Hub. You can view your model here: \", repo_url)\n",
      "```\n",
      "\n",
      "### .\n",
      "\n",
      "By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.===== Document 1 =====\n",
      "```py\n",
      ">>> trainer.push_to_hub()\n",
      "```\n",
      "</pt>\n",
      "<tf>\n",
      "Share a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:\n",
      "\n",
      "- An output directory for your model.\n",
      "- A tokenizer.\n",
      "- The `hub_model_id`, which is your Hub username and model name.\n",
      "\n",
      "```py\n",
      ">>> from transformers import PushToHubCallback\n",
      "\n",
      ">>> push_to_hub_callback = PushToHubCallback(\n",
      "...     output_dir=\"./your_model_save_path\", tokenizer=tokenizer, hub_model_id=\"your-username/my-awesome-model\"\n",
      "... )\n",
      "```===== Document 2 =====\n",
      "Let's pretend we've now fine-tuned the model. The next step would be to push it to the Hub! We can do this with the `timm.models.hub.push_to_hf_hub` function.\n",
      "\n",
      "```py\n",
      ">>> model_cfg = dict(labels=['a', 'b', 'c', 'd'])\n",
      ">>> timm.models.hub.push_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\n",
      "```\n",
      "\n",
      "Running the above would push the model to `<your-username>/resnet18-random` on the Hub. You can now share this model with your friends, or use it in your own code!\n",
      "\n",
      "## Loading a Model===== Document 3 =====\n",
      "processor.push_to_hub(hub_model_id)\n",
      "trainer.push_to_hub(**kwargs)\n",
      "```\n",
      "\n",
      "# 4. Inference\n",
      "\n",
      "Now comes the exciting part, using our fine-tuned model! In this section, we'll show how you can load your model from the hub and use it for inference.===== Document 4 =====\n",
      "--push_to_hub\n",
      "```===== Document 5 =====\n",
      ". The second way to upload a model, though, is to call model.push_to_hub(). So this is more of a once-off method - it's not called regularly during training. You can just call this manually whenever you want to upload a model to the hub. So we recommend running this after the end of training, just to make sure that you have a commit message just to guarantee that this was the final version of the model at the end of training. And it just makes sure that you're working with the definitive end-of-training model and not accidentally using a model that's from a checkpoint somewhere along the way===== Document 6 =====\n",
      "Finally, if you want, you can push your model up to the hub. Here, we'll push it up if you specified `push_to_hub=True` in the training configuration. Note that in order to push to hub, you'll have to have git-lfs installed and be logged into your Hugging Face account (which can be done via `huggingface-cli login`).\n",
      "\n",
      "```python\n",
      "kwargs = {\n",
      "    \"finetuned_from\": model.config._name_or_path,\n",
      "    \"tasks\": \"image-classification\",\n",
      "    \"dataset\": 'beans',\n",
      "    \"tags\": ['image-classification'],\n",
      "}\n",
      "\u001b[38;20m===== New step =====\u001b[0m\n",
      "===== Calling LLM with this last message: =====\n",
      "{'role': <MessageRole.TOOL_RESPONSE: 'tool-response'>, 'content': 'Observation: Retrieved documents:\\n===== Document 0 =====\\n# Step 7. Push everything to the Hub\\n    api.upload_folder(\\n        repo_id=repo_id,\\n        folder_path=repo_local_path,\\n        path_in_repo=\".\",\\n    )\\n\\n    print(\"Your model is pushed to the Hub. You can view your model here: \", repo_url)\\n```\\n\\n### .\\n\\nBy using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.===== Document 1 =====\\n```py\\n>>> trainer.push_to_hub()\\n```\\n</pt>\\n<tf>\\nShare a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:\\n\\n- An output directory for your model.\\n- A tokenizer.\\n- The `hub_model_id`, which is your Hub username and model name.\\n\\n```py\\n>>> from transformers import PushToHubCallback\\n\\n>>> push_to_hub_callback = PushToHubCallback(\\n...     output_dir=\"./your_model_save_path\", tokenizer=tokenizer, hub_model_id=\"your-username/my-awesome-model\"\\n... )\\n```===== Document 2 =====\\nLet\\'s pretend we\\'ve now fine-tuned the model. The next step would be to push it to the Hub! We can do this with the `timm.models.hub.push_to_hf_hub` function.\\n\\n```py\\n>>> model_cfg = dict(labels=[\\'a\\', \\'b\\', \\'c\\', \\'d\\'])\\n>>> timm.models.hub.push_to_hf_hub(model, \\'resnet18-random\\', model_config=model_cfg)\\n```\\n\\nRunning the above would push the model to `<your-username>/resnet18-random` on the Hub. You can now share this model with your friends, or use it in your own code!\\n\\n## Loading a Model===== Document 3 =====\\nprocessor.push_to_hub(hub_model_id)\\ntrainer.push_to_hub(**kwargs)\\n```\\n\\n# 4. Inference\\n\\nNow comes the exciting part, using our fine-tuned model! In this section, we\\'ll show how you can load your model from the hub and use it for inference.===== Document 4 =====\\n--push_to_hub\\n```===== Document 5 =====\\n. The second way to upload a model, though, is to call model.push_to_hub(). So this is more of a once-off method - it\\'s not called regularly during training. You can just call this manually whenever you want to upload a model to the hub. So we recommend running this after the end of training, just to make sure that you have a commit message just to guarantee that this was the final version of the model at the end of training. And it just makes sure that you\\'re working with the definitive end-of-training model and not accidentally using a model that\\'s from a checkpoint somewhere along the way===== Document 6 =====\\nFinally, if you want, you can push your model up to the hub. Here, we\\'ll push it up if you specified `push_to_hub=True` in the training configuration. Note that in order to push to hub, you\\'ll have to have git-lfs installed and be logged into your Hugging Face account (which can be done via `huggingface-cli login`).\\n\\n```python\\nkwargs = {\\n    \"finetuned_from\": model.config._name_or_path,\\n    \"tasks\": \"image-classification\",\\n    \"dataset\": \\'beans\\',\\n    \"tags\": [\\'image-classification\\'],\\n}'}\n",
      "\u001b[38;20m===== Output message of the LLM: =====\u001b[0m\n",
      "\u001b[38;20mThought: I have found multiple ways to push a model to the Hub. In order to answer comprehensively, I will provide each code snippet that describes how to push a model to the Hub.\n",
      "\n",
      "Action: ```json\n",
      "{\n",
      "  \"action\": \"final_answer\",\n",
      "  \"action_input\": {\n",
      "    \"answer\": \"There are multiple ways to push a model to the Hub. Here are a few examples using different libraries and functions:\\n\\nUsing the `api`:\\n```python\\napi.upload_folder(\\n    repo_id=repo_id,\\n    folder_path=repo_local_path,\\n    path_in_repo=\\\".\\\",\\n)\\n\\nprint(\\\"Your model is pushed to the Hub. You can view your model here:\\\", repo_url)\\n```\\n\\nWith Transformers:\\n```python\\nfrom transformers import PushToHubCallback\\n\\n# Initialize the callback with the output directory,\\ntokenizer, and your Hub username and model name\\npush_to_hub_callback = PushToHubCallback(\\n    output_dir=\\\"./your_model_save_path\\\",\\n    tokenizer=tokenizer,\\n    hub_model_id=\\\"your-username/my-awesome-model\\\"\\n)\\n\\n# Assuming `trainer` is your Trainer object\\ntrainer.add_callback(push_to_hub_callback)\\n```\\n\\nUsing `timm`:\\n```python\\nfrom timm.models.hub import push_to_hf_hub\\n\\n# Assuming `model` is your fine-tuned model\\nmodel_cfg = {\\\"labels\\\": [\\\"a\\\", \\\"b\\\", \\\"c\\\", \\\"d\\\"]}\\npush_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\\n```\\n\\nFor computer vision models, you can also use `push_to_hub`:\\n```python\\nprocessor.push_to_hub(hub_model_id)\\ntrainer.push_to_hub(**kwargs)\\n```\\n\\nYou can also manually push a model with `model.push_to_hub()`:\\n```python\\nmodel.push_to_hub()\\n```\\n\\nAdditionally, you can opt to push your model to the Hub at the end of training by specifying `push_to_hub=True` in the training configuration. Don't forget to have git-lfs installed and be logged into your Hugging Face account.\"\n",
      "  }\n",
      "}\u001b[0m\n",
      "\u001b[38;20m===== Extracting action =====\u001b[0m\n",
      "\u001b[33;1mCalling tool: 'final_answer' with arguments: {'answer': \"There are multiple ways to push a model to the Hub. Here are a few examples using different libraries and functions:\\n\\nUsing the `api`:\\npython\\napi.upload_folder(\\n    repo_id=repo_id,\\n    folder_path=repo_local_path,\\n    path_in_repo='.',\\n)\\n\\nprint('Your model is pushed to the Hub. You can view your model here:', repo_url)\\n\\n\\nWith Transformers:\\npython\\nfrom transformers import PushToHubCallback\\n\\n# Initialize the callback with the output directory,\\ntokenizer, and your Hub username and model name\\npush_to_hub_callback = PushToHubCallback(\\n    output_dir='./your_model_save_path',\\n    tokenizer=tokenizer,\\n    hub_model_id='your-username/my-awesome-model'\\n)\\n\\n# Assuming `trainer` is your Trainer object\\ntrainer.add_callback(push_to_hub_callback)\\n\\n\\nUsing `timm`:\\npython\\nfrom timm.models.hub import push_to_hf_hub\\n\\n# Assuming `model` is your fine-tuned model\\nmodel_cfg = {'labels': ['a', 'b', 'c', 'd']}\\npush_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\\n\\n\\nFor computer vision models, you can also use `push_to_hub`:\\npython\\nprocessor.push_to_hub(hub_model_id)\\ntrainer.push_to_hub(**kwargs)\\n\\n\\nYou can also manually push a model with `model.push_to_hub()`:\\npython\\nmodel.push_to_hub()\\n\\n\\nAdditionally, you can opt to push your model to the Hub at the end of training by specifying `push_to_hub=True` in the training configuration. Don't forget to have git-lfs installed and be logged into your Hugging Face account.\"}\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final output:\n",
      "There are multiple ways to push a model to the Hub. Here are a few examples using different libraries and functions:\n",
      "\n",
      "Using the `api`:\n",
      "python\n",
      "api.upload_folder(\n",
      "    repo_id=repo_id,\n",
      "    folder_path=repo_local_path,\n",
      "    path_in_repo='.',\n",
      ")\n",
      "\n",
      "print('Your model is pushed to the Hub. You can view your model here:', repo_url)\n",
      "\n",
      "\n",
      "With Transformers:\n",
      "python\n",
      "from transformers import PushToHubCallback\n",
      "\n",
      "# Initialize the callback with the output directory,\n",
      "tokenizer, and your Hub username and model name\n",
      "push_to_hub_callback = PushToHubCallback(\n",
      "    output_dir='./your_model_save_path',\n",
      "    tokenizer=tokenizer,\n",
      "    hub_model_id='your-username/my-awesome-model'\n",
      ")\n",
      "\n",
      "# Assuming `trainer` is your Trainer object\n",
      "trainer.add_callback(push_to_hub_callback)\n",
      "\n",
      "\n",
      "Using `timm`:\n",
      "python\n",
      "from timm.models.hub import push_to_hf_hub\n",
      "\n",
      "# Assuming `model` is your fine-tuned model\n",
      "model_cfg = {'labels': ['a', 'b', 'c', 'd']}\n",
      "push_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\n",
      "\n",
      "\n",
      "For computer vision models, you can also use `push_to_hub`:\n",
      "python\n",
      "processor.push_to_hub(hub_model_id)\n",
      "trainer.push_to_hub(**kwargs)\n",
      "\n",
      "\n",
      "You can also manually push a model with `model.push_to_hub()`:\n",
      "python\n",
      "model.push_to_hub()\n",
      "\n",
      "\n",
      "Additionally, you can opt to push your model to the Hub at the end of training by specifying `push_to_hub=True` in the training configuration. Don't forget to have git-lfs installed and be logged into your Hugging Face account.\n"
     ]
    }
   ],
   "source": [
    "agent_output = agent.run(\"How can I push a model to the Hub?\")\n",
    "\n",
    "print(\"Final output:\")\n",
    "print(agent_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 智能体RAG与标准RAG的比较\n",
    "\n",
    "智能体 RAG 和标准 RAG，哪个更好？我们用 LLM Judge 来比一比。\n",
    "\n",
    "我们会用一个非常强的模型 [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) 来做这个评估。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_dataset = datasets.load_dataset(\"m-ric/huggingface_doc_qa_eval\", split=\"train\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在运行测试之前，让我们让智能体输出更简洁一些。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "\n",
    "agent.logger.setLevel(logging.WARNING)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "outputs_agentic_rag = []\n",
    "\n",
    "for example in tqdm(eval_dataset):\n",
    "    question = example[\"question\"]\n",
    "\n",
    "    enhanced_question = f\"\"\"Using the information contained in your knowledge base, which you can access with the 'retriever' tool,\n",
    "give a comprehensive answer to the question below.\n",
    "Respond only to the question asked, response should be concise and relevant to the question.\n",
    "If you cannot find information, do not give up and try calling your retriever again with different arguments!\n",
    "Make sure to have covered the question completely by calling the retriever tool several times with semantically different queries.\n",
    "Your queries should not be questions but affirmative form sentences: e.g. rather than \"How do I load a model from the Hub in bf16?\", query should be \"load a model from the Hub bf16 weights\".\n",
    "\n",
    "Question:\n",
    "{question}\"\"\"\n",
    "    answer = agent.run(enhanced_question)\n",
    "    print(\"=======================================================\")\n",
    "    print(f\"Question: {question}\")\n",
    "    print(f\"Answer: {answer}\")\n",
    "    print(f'True answer: {example[\"answer\"]}')\n",
    "\n",
    "    results_agentic = {\n",
    "        \"question\": question,\n",
    "        \"true_answer\": example[\"answer\"],\n",
    "        \"source_doc\": example[\"source_doc\"],\n",
    "        \"generated_answer\": answer,\n",
    "    }\n",
    "    outputs_agentic_rag.append(results_agentic)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import InferenceClient\n",
    "\n",
    "reader_llm = InferenceClient(\"CohereForAI/c4ai-command-r-plus\")\n",
    "\n",
    "outputs_standard_rag = []\n",
    "\n",
    "for example in tqdm(eval_dataset):\n",
    "    question = example[\"question\"]\n",
    "    context = retriever_tool(question)\n",
    "\n",
    "    prompt = f\"\"\"Given the question and supporting documents below, give a comprehensive answer to the question.\n",
    "Respond only to the question asked, response should be concise and relevant to the question.\n",
    "Provide the number of the source document when relevant.\n",
    "If you cannot find information, do not give up and try calling your retriever again with different arguments!\n",
    "\n",
    "Question:\n",
    "{question}\n",
    "\n",
    "{context}\n",
    "\"\"\"\n",
    "    messages = [{\"role\": \"user\", \"content\": prompt}]\n",
    "    answer = reader_llm.chat_completion(messages).choices[0].message.content\n",
    "\n",
    "    print(\"=======================================================\")\n",
    "    print(f\"Question: {question}\")\n",
    "    print(f\"Answer: {answer}\")\n",
    "    print(f'True answer: {example[\"answer\"]}')\n",
    "\n",
    "    results_agentic = {\n",
    "        \"question\": question,\n",
    "        \"true_answer\": example[\"answer\"],\n",
    "        \"source_doc\": example[\"source_doc\"],\n",
    "        \"generated_answer\": answer,\n",
    "    }\n",
    "    outputs_standard_rag.append(results_agentic)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "评估提示遵循了[我们的 llm_judge cookbook](llm_judge) 中展示的一些最佳原则：它遵循一个小的整数李克特量表，有明确的评分标准和每个分数的描述。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "EVALUATION_PROMPT = \"\"\"You are a fair evaluator language model.\n",
    "\n",
    "You will be given an instruction, a response to evaluate, a reference answer that gets a score of 3, and a score rubric representing a evaluation criteria are given.\n",
    "1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n",
    "2. After writing a feedback, write a score that is an integer between 1 and 3. You should refer to the score rubric.\n",
    "3. The output format should look as follows: \\\"Feedback: {{write a feedback for criteria}} [RESULT] {{an integer number between 1 and 3}}\\\"\n",
    "4. Please do not generate any other opening, closing, and explanations. Be sure to include [RESULT] in your output.\n",
    "5. Do not score conciseness: a correct answer that covers the question should receive max score, even if it contains additional useless information.\n",
    "\n",
    "The instruction to evaluate:\n",
    "{instruction}\n",
    "\n",
    "Response to evaluate:\n",
    "{response}\n",
    "\n",
    "Reference Answer (Score 3):\n",
    "{reference_answer}\n",
    "\n",
    "Score Rubrics:\n",
    "[Is the response complete, accurate, and factual based on the reference answer?]\n",
    "Score 1: The response is completely incomplete, inaccurate, and/or not factual.\n",
    "Score 2: The response is somewhat complete, accurate, and/or factual.\n",
    "Score 3: The response is completely complete, accurate, and/or factual.\n",
    "\n",
    "Feedback:\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import InferenceClient\n",
    "\n",
    "evaluation_client = InferenceClient(\"meta-llama/Meta-Llama-3-70B-Instruct\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 65/65 [02:24<00:00,  2.23s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average score for agentic RAG: 78.5%\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 65/65 [02:17<00:00,  2.12s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average score for standard RAG: 70.0%\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "for type, outputs in [\n",
    "    (\"agentic\", outputs_agentic_rag),\n",
    "    (\"standard\", outputs_standard_rag),\n",
    "]:\n",
    "    for experiment in tqdm(outputs):\n",
    "        eval_prompt = EVALUATION_PROMPT.format(\n",
    "            instruction=experiment[\"question\"],\n",
    "            response=experiment[\"generated_answer\"],\n",
    "            reference_answer=experiment[\"true_answer\"],\n",
    "        )\n",
    "        messages = [\n",
    "            {\"role\": \"system\", \"content\": \"You are a fair evaluator language model.\"},\n",
    "            {\"role\": \"user\", \"content\": eval_prompt},\n",
    "        ]\n",
    "\n",
    "        eval_result = evaluation_client.text_generation(\n",
    "            eval_prompt, max_new_tokens=1000\n",
    "        )\n",
    "        try:\n",
    "            feedback, score = [item.strip() for item in eval_result.split(\"[RESULT]\")]\n",
    "            experiment[\"eval_score_LLM_judge\"] = score\n",
    "            experiment[\"eval_feedback_LLM_judge\"] = feedback\n",
    "        except:\n",
    "            print(f\"Parsing failed - output was: {eval_result}\")\n",
    "\n",
    "    results = pd.DataFrame.from_dict(outputs)\n",
    "    results = results.loc[~results[\"generated_answer\"].str.contains(\"Error\")]\n",
    "    results[\"eval_score_LLM_judge_int\"] = (\n",
    "        results[\"eval_score_LLM_judge\"].fillna(1).apply(lambda x: int(x))\n",
    "    )\n",
    "    results[\"eval_score_LLM_judge_int\"] = (results[\"eval_score_LLM_judge_int\"] - 1) / 2\n",
    "\n",
    "    print(\n",
    "        f\"Average score for {type} RAG: {results['eval_score_LLM_judge_int'].mean()*100:.1f}%\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**让我们回顾一下：与标准的 RAG 相比，智能体设置提高了 8.5% 的得分！**（从 70.0% 提高到 78.5%）\n",
    "\n",
    "这是一个巨大的改进，而且设置非常简单🚀\n",
    "\n",
    "（作为基准，不使用知识库的 Llama-3-70B 得分为 36%）\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "disposable",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
